Bayesian Model Selectionof Autoregressive

نویسندگان

J. Vermaak

C. Andrieu

چکیده

This paper poses the problem of model order determination of an autoregressive (AR) pro ess within a Bayesian framework. Several original hierar hi al prior models are proposed that allow for the stability of the model to be enfor ed and a ount for a possible unknown initial state. Obtaining the posterior model order probabilities requires integration of the resulting posterior distribution, an operation whi h is analyti ally intra table. Here sto hasti reversible jump Markov hain Monte Carlo (MCMC) algorithms are developed to perform the required integration by simulating from the posterior distribution. The methods developed are evaluated in simulation studies on a number of syntheti and real data sets, and ompared to standard model sele tion riteria. 1 I Introdu tion When tting an autoregressive (AR) model to real data, e.g. spee h, the orre t model order is often unknown. Using too low a value for the model order would lead to an undesired smoothing of the data, whereas overtting would be the result of using too high a value. AR model order determination falls within the larger lass of model sele tion problems. In signal pro essing lassi al methods to solve these problems in lude e.g. AIC [1℄ and MDL [17℄, whi h rely on information theoreti riteria, and BIC [21℄, whi h is approximately an asymptoti Bayes fa tor. These methods, however, often fail when the data sets are small, and for AR pro esses AIC is not asymptoti ally onsistent. They are also not easily modi ed to a ommodate onstraints, su h as enfor ing the stability of the model. In this paper the model sele tion problem for AR pro esses is posed within a Bayesian framework. Within this framework the unknown parameters are assumed to be distributed a ording to a prior distribution, whi h in orporates all the available information about the pro ess. Here original hierar hi al prior models are proposed that allow the stability of the model to be enfor ed and a ount for a possible unknown initial state. All the information on erning the parameters hara terising the model is then ontained in the posterior distribution, whi h follows from an appli ation of Bayes' rule. For the model sele tion problem the parti ular quantities of interest are the marginal posterior model probabilities. These quantities are obtained, at least in theory, by integrating, for ea h model, the full posterior distribution over the unknown model parameters, an operation whi h is analyti ally intra table in general. This integration problem is ommonly solved by applying asymptoti approximations (in the size of the data set) and invoking Lapla e's method of integration. Quite often, however, the approximations obtained in this way may be poor, leading to in orre t results, espe ially if the data sets are small. Re ently, interesting alternatives for integration have emerged, most of whi h are based on Markov hain Monte Carlo (MCMC) sampling te hniques (see e.g. [18℄ for an introdu tion to MCMC methods). The key idea of these methods is to onstru t and run an ergodi Markov hain whose invariant distribution is the posterior distribution of interest. Samples from this hain an then be used to approximate the posterior distribution and its features of interest. 2 MCMC simulation was originally proposed as a method for drawing samples from nononventional distributions, hen e being restri ted to problems where the distributions have xed dimensionalities. In [10℄ it was shown that the MCMC methodology an be generalised to ope with problems where the models are also unknown. Coining the phrase reversible jump MCMC, this approa h establishes a exible framework for onstru ting reversible Markov hain samplers, where moves between di erent parameter spa es are also allowed. Thus, the Markov hain not only moves within parameter spa es that orrespond to parti ular models, but also jumps between them, notwithstanding their (possible) di eren e in dimensionality. Given the importan e of the AR model sele tion problem, several authors have already proposed Bayesian models and MCMC-based omputational methods. Fixed-dimension MCMC algorithms are proposed in e.g. [4, 12℄, whereas reversible jump MCMC algorithms an be found in e.g. [3, 24℄. In [4℄ the AR oeÆ ients are reparameterised i.t.o. the re e tion oeÆ ients. All the oeÆ ients up to a spe i ed maximum order are onsidered, and model sele tion is performed by asso iating a binary indi ator variable with ea h oeÆ ient, and using these to perform subset sele tion. In [12℄ a prior stru ture is de ned dire tly on the roots of the AR hara teristi polynomial. Model un ertainty is impli itly a ounted for by allowing the roots to have zero moduli. The reversible jump MCMC strategy in [3℄ utilises the fa t that, under the parameterisation i.t.o. the re e tion oeÆ ients, the AR model an be treated as a nested model, and employs model moves that in rease or de rease the model dimension by one, with the oeÆ ients ommon to the urrent and proposed models remaining un hanged. In ontrast with this, the strategy in [24℄ allows model jumps of arbitrary magnitude and proposes the new oeÆ ient values from the orresponding posterior onditional distribution. Stability is, however, not enfor ed. In this paper reversible jump MCMC algorithms are developed to solve the AR model sele tion problem. Novel Bayesian models are proposed that enfor e the stability of the model and allow for an unknown initial state, should it be required. These models are similar to the hierar hi al stru ture adopted in e.g. [16℄. The prior distributions are vague and uninformative, and appear to be robust to hoi es of the xed parameters. The resulting estimation algorithms make the best use of the statisti al stru ture of the Bayesian models, and are omputationally eÆ ient. The algorithms are shown to perform well on a variety of model sele tion problems using syntheti and 3 real data sets. The remainder of the paper is organised as follows. In Se tion II the signal model is spe i ed and an alternative parameterisation i.t.o. the re e tion oeÆ ients is proposed for the ase where stability is to be enfor ed. Se tion III presents the Bayesian models and estimation obje tives. Reversible jump MCMC algorithms to simulate from the resulting posterior distributions are developed in Se tion IV. Se tion V reports the results of simulation studies on syntheti and real data sets. Finally, in Se tion VI, some on lusions are rea hed. All the notation and abbreviations used in this paper are detailed in Appendix A. Appendix B proves the geometri ergodi ity of the MCMC algorithm to sample the unknown parameters, and Appendix C summarises the important Levinson re ursions. II Signal Model Let x1:T , (x1; : : : ; xT ) denote an observed ve tor of T real samples. The elements of x1:T are here assumed to be represented by one of the models Mk, k 2 K , f0; : : : ; kmaxg, with Mk orresponding to a k-th order AR pro ess, i.e. M0 : xt = 0et; k = 0 (1) Mk : xt = k Xi=1 ai;kxt i + ket; k 2 f1; : : : ; kmaxg ; (2) where ak , (a1;k; : : : ; ak;k) are the oeÆ ients of the k-th order pro ess, 2 k is the ex itation varian e, and et iid N (0; 1). The AR equation in (2) an be rewritten in ve tor-matrix form as ke1:T = x1:T Xkak = Akx k+1:T ; (3) where the matri es Xk 2 RT k and Ak 2 RT (T+k) are de ned as Xk , 66666666664 x0 x 1 : : : x k+1 x1 x0 : : : x k+2 ... ... ... ... xT 1 xT 2 : : : xT k 77777777775 Ak , 66666666664 ak;k : : : a1;k 1 0 : : : 0 0 ak;k : : : a1;k 1 . . . . . . ... . . . . . . . . . . . . . . . . . . ... . . . . . . ak;k : : : a1;k 1 77777777775 : (4) In addition the onstraint kmax < T is imposed here. If this is not the ase the olumns of Xk are ne essarily linearly dependent and the model parameters are not uniquely determined by the 4 data. The likelihood fun tion for the AR model parameters follows from a straightforward random variable transformation from e1:T to x1:T , and is given by p x1:T j k; ak; 2 k ;x0;k = N x1:T ;Xkak; 2 kIT ; (5) where x0;k , (x0; : : : ; x k+1) denotes the initial state of the AR pro ess. In many pra ti al appli ations the initial state is known. This is, for example, the ase in the frame-based pro essing of spee h, where the initial state for the urrent frame an be obtained from the pre eding frame. Even in ases where the initial state is unknown, pra ti al data sets are often large enough so that their initial samples an be taken as the initial state of the AR pro ess. However, in a few important appli ations the data sets are too small to sa ri e any of the available samples in this manner. For optimal performan e in these ases the initial state should be onsidered as an unknown random quantity to be estimated alongside the other unknown parameters. Estimation for both a known and unknown initial state is onsidered in this paper. Stability Issues. Stability of the AR pro ess is often easier to he k and enfor e by reparameterising the AR oeÆ ients i.t.o. the re e tion oeÆ ients, here denoted by k , ( 1; : : : ; k). For the pro ess to be stable ea h of the re e tion oeÆ ients has to lie within ( 1; 1). The AR and re e tion oeÆ ients are related through the well-known Levinson re ursions (see e.g. [6℄), here denoted by ak = L ( k). For the sake of onvenien e these re ursions are summarised in Appendix C. The Levinson re ursions de ne a non-linear invertible mapping between the AR and re e tion oeÆ ients. At the maximum likelihood (ML) estimate for the AR oeÆ ients, given by baML k , XTkXk 1XTkx1:T ; (6) the following linear invertible transformation holds true b ML k = U 1 k baML k ; (7) where b ML k is the ML estimate of the re e tion oeÆ ients, and (XTkXk) 1 = Uk 2kUTk is obtained from the Cholesky fa torisation of (XTkXk) 1, with Uk an upper triangular matrix with unit diagonal, and 2k , diag Æ2 1;k; : : : ; Æ2 k;k a diagonalmatrix with positive entries (see [8℄ for details). Note that for (7) to be valid, the inverse of the estimated data ovarian e matrixXTkXk is required 5 to be Toeplitz1. This an be a hieved by for ing the signal to be zero for t < 1 and t > T , as in e.g. [11℄, or by assuming the data to be ir ular, i.e. xT k+1:T = x k+1:0. The transformation in (7) also holds approximately for AR oeÆ ient ve tors in the vi inity of the ML estimate. Extensive use of this approximation is made in Se tion IV-C, where an eÆ ient reversible jump MCMC algorithm is designed to simulate from the posterior distribution of the AR parameters in the ase where stability is to be enfor ed. III Bayesian Models Denote by k 2 k the parameter ve tor asso iated with the model indexed by k 2 K, here de ned as 0 , 2 0 ; Æ2; ; 2 ; k = 0 (8) k , k; 2 k; Æ2; ;x0;k; 2 ; k 2 f1; : : : ; kmaxg ; (9) where k = k if stability is to be enfor ed and k = ak otherwise, and Æ2; ; 2 are hyperparameters whose meaning will be evident later. In ases where the initial state is known, the quantities x0;k; 2 disappear from the parameter ve tor. The overall parameter spa e an be written as a ountable union of subspa es, i.e. = Skmax k=0 fkg k, where 0 , R+ R+ R+ R+ ; k = 0 (10) k , k R+ R+ R+ Rk R+ ; k 2 f1; : : : ; kmaxg ; (11) where k = ( 1; 1)k if stability is to be enfor ed and k = Rk otherwise. Within a Bayesian framework all the information of interest on erning the unknown parameters is ontained in the posterior distribution p (k; jx1:T ), whi h may be expressed as p (k; jx1:T ) = kmax Xi=0 p ( i; ijx1:T ) Ifig i (k; ) : (12) Note that (k; ) is in one of the spa es fig i, i 2 K, and the probability of k being equal to i and being in an in nitesimal set d i entred around i is p ( i; ijx1:T ) d i. The distributions on the right hand side of (12) follow from Bayes' rule as p (k; kjx1:T ) / p (x1:T j k; k) p (k; k) ; k 2 K: (13) 1This assumption is not stri tly required for the strategies developed here. An explanation is given in Se tion IV-C. 6 In the above p (x1:T j k; k) is the likelihood fun tion, here given by (5), and p (k; k) is the prior distribution of the unknown parameters of the k-th model. A hierar hi al prior stru ture of the form p (k; k) , p x0;kj k; 2 k; 2 p k; 2 k k; Æ2 p (kj ) p Æ2 p ( ) p 2 ; (14) is adopted here, wherep x0;kj k; 2 k; 2 , N x0;k;0k 1; 2 2 k x0;k (15) p k; 2 k k; Æ2 , p kj k; 2 k; Æ2 p 2 k , kN k;0k 1; Æ2 2 k k I k ( k) IG 2 k; 0; 0 (16) p (kj ) , k k! IK (k) (17) p Æ2 , IG Æ2; Æ2 ; Æ2 (18) p ( ) , Ga ( ; ; ) (19) p 2 , IG 2; 2 ; 2 : (20) Similar to what was done for linear regression models before in e.g. [9, 22℄, the matrix k is here set to k = ak = (XTkXk) 1 if k = ak, or k = k = 2k if k = k. For this hoi e of ak the prior distribution p ak; 2 k k; Æ2 orresponds to the popular g-prior distribution (see [25℄ for a motivation). It an also be derived using maximum entropy arguments (see [2℄ for details). Note that the prior for k depends on the data. Experiments were also performed with k = Ik, but no signi ant di eren e in the nal results was observed. The hyper-parameters Æ2 and an respe tively be interpreted as the expe ted signal energy, normalised by the ex itation varian e, and an approximation of the mean expe ted model order. Remark 1 The hyper-parameter Æ2 also plays a ru ial role in the reversible jump MCMC model sele tion algorithms developed here. During a model move the generi expression for the a eptan e ratio in (41) leads to the a eptan e probability for this move being dependent on the ratio of the normalising onstants of the prior distributions over the parameter spa es for the proposed and urrent models. Under normal ir umstan es this ratio depends only on the form of the priors and the dimensionality of the orresponding parameter spa es, and not on the data, and hen e introdu es an undesired bias in the estimation strategy. However, by allowing the priors to depend 7 on some hyper-parameter, su h as Æ2, and estimating this hyper-parameter from the data alongside the other parameters, this undesired bias an be eliminated. In the ase where k = ak and k = Rk , the normalising onstant for the prior on k is equal to one, i.e. k = 1. If stability is to be enfor ed, i.e. k = k and k = ( 1; 1)k, and k = 2k, the normalising onstant follows straightforwardly as 1 k = k Y i=1 erf0 1 q2Æ2 2 kÆ2 i;k1A ; (21) sin e the ovarian e matrix of the prior distribution is diagonal. In the above the Gaussian error fun tion is de ned as erf (x) , 2 1=2 R x 0 exp s2 ds. Note that in this ase k is a omplex non-linear fun tion of 2 k and Æ2. The prior on the model order in (17) is a trun ated Poisson distribution. The normalising onstant for this distribution follows straightforwardly as 1 = X k2K k k! : (22) When stationarity is assumed, the prior distribution on the initial state is normally taken to be the initial state likelihood (see e.g. [5℄). To obtain this distribution the hyper-parameter 2 is set to one, and the matrix x0;k is taken to be the ovarian e matrix of k samples drawn from an AR pro ess with oeÆ ients ak and unity ex itation varian e. The elements of x0;k are then omplex non-linear fun tions of the AR oeÆ ients. Realising that the initial state an be quite arbitrary in pra ti e, no su h stru ture is imposed here. Instead, x0;k is taken to be the identity matrix, i.e. x0;k = Ik, and the hyper-parameter ontrolling the varian e of the prior on the initial state 2 is onsidered a random variable to be estimated alongside the other parameters. In the prior spe i ation the quantities 0; 0; Æ2 ; Æ2 ; ; ; 2 ; 2 are assumed to be xed and known, and an be set to re e t vague or informative prior information on erning the orresponding unknown parameters. The e e ts of di erent hoi es for these parameters are investigated in Se tion V-A. For model sele tion purposes the quantities of interest are the posterior model order probabilities p (kjx1:T ), for all k 2 K. These are obtained, at least in theory, by integrating the orresponding posterior distributions in (13) over k. The aim here is to nd the value of k for 8 whi h the posterior model order probability distribution is maximised, i.e. bkMMAP , argmax k2K fp (kjx1:T )g : (23) For the model under investigation here the integration to obtain p (kjx1:T ) is analyti ally intra table. The next se tion develops reversible jump MCMC algorithms to solve this problem. IV MCMC Computation for Model Order Estimation This se tion develops iterative numeri al algorithms to perform the integration ne essary to solve the estimation (or dete tion) problem posed in (23). This integration an be solved impli itly by obtaining a set of i.i.d. samples nk(i); (i) k(i) : i = 1; : : : ; No distributed a ording to p (k; jx1:T ). The samples k(i) : i = 1; : : : ; N are then marginally distributed a ording to p (kjx1:T ), so that an empiri al estimate of this distribution an be obtained as b pN (k = jjx1:T ) , 1 N N Xi=1 Ifjg k(i) ; for all j 2 K: (24) An approximate solution to (23) is then simply the value of k that maximises (24). The samples an also be used to obtain approximate estimates of the model parameters and predi tions of future observations. For example, the minimum mean square error (MMSE) estimate of the parameters of the model indexed by, say k = k , and the one-step ahead predi tion, de ned as b MMSE k , Ep( kjk=k ;x1:T ) [ k℄ (25) b xMMSE T+1 , Ep( xT+1jx1:T ) [xT+1℄ = X k2K Z k xTkakp (k; d kjx1:T ) ; (26) respe tively, with xk , (xT ; : : : xT k+1), an be approximated using the samples as MMSE k , PNi=1 (i) k(i)Ifk g k(i) PNi=1 Ifk g k(i) (27) xMMSE T+1 , 1 N N Xi=1 xTk(i)a(i) k(i) : (28) Note that in (26) the predi tion is performed by averaging over all the possible models and their parameters. This is in ontrast to standard methods whi h ondition on a single model, normally obtained by ML estimation. Following from the strong law of large numbers (SLLN) the approximate estimates in (27) and (28) onverge to their true values in (25) and (26) as the number of independent samples N in reases to in nity. 9 Unfortunately, obtaining i.i.d. samples from p (k; jx1:T ) dire tly is not straightforward, and alternative sampling strategies need to be investigated. Here MCMC algorithms are developed to sample from the posterior distribution. Classi al MCMC methods are unable to handle the problem of dimension hange implied by onsidering models with asso iated parameter ve tors de ned over spa es of di ering dimensions. However, a general framework, viz. reversible jump MCMC has been proposed in [10℄, that allows the design of eÆ ient algorithms for Bayesian model sele tion. This method is a general state-spa e Metropolis-Hastings (MH) algorithm, where andidate samples are proposed from a set of proposal distributions, and randomly a epted a ording to an a eptan e ratio whi h ensures reversibility, and hen e invarian e of the Markov hain w.r.t. the desired posterior distribution. The hain moves a ross subspa es of (possibly) di ering dimension by reversible model moves that a hieve proper dimension mat hing (see [10℄ for details). The resulting samples are not stri tly independent, but if the Markov hain is invariant w.r.t. the posterior distribution and ergodi (see Se tion IV-E) a SLLN exists, so that the approximations in (24) and (27) remain valid. A A MCMC Algorithm A general MCMC algorithm employing Gibbs-like steps to sample from the posterior distribution p (k; jx1:T ) pro eeds as follows. Algorithm 1 (MCMC Algorithm to Simulate from p (k; jx1:T )) 1. Initialise k(0); (0) k(0) ; 2(0) k(0) ; Æ2(0); (0);x(0) 0;k(0) ; 2(0) 2 randomly or deterministi ally and set i = 1. 2. Iteration i: (a) Simulate the AR model parameters: k(i); (i) k(i) ; 2(i) k(i) ;x(i) 0;k(i) p k; k; 2 k;x0;k Æ2(i 1); (i 1); 2(i 1);x1:T (b) Simulate the hyper-parameters: Æ2(i) p Æ2 k(i); (i) k(i) ; 2(i) k(i) (i) p j k(i) 10 2(i) p 2 k(i); 2(i) k(i) ;x(i) 0;k(i) 3. Set i i+ 1 and go to 2. In the ensuing dis ussion the supers ript (i) is dropped from all variables when there is no danger of any ambiguities arising. Remark 2 If the initial state is xed and known, the quantities x0;k; 2 do not need to be sampled, and onsequently disappear from the above algorithm. Sin e simulating from p k; k; 2 k ;x0;k Æ2; ; 2;x1:T involves moving between subspa es that di er in dimension, it is here sampled from using a reversible jump MCMC step. This step pro eeds di erently for the two ases where stability is ignored or expli itly enfor ed. The following se tion onsiders the ase where stability is not enfor ed. Algorithms are presented for both a xed and known, and random and unknown initial state. Following this, the fo us shifts to the ase where stability is expli itly enfor ed. Finally, sampling strategies are proposed for the hyper-parameters, and some onvergen e results are given. B Un onstrained Case Re all that for the un onstrained ase k = ak and k = Rk . The two ases where the initial state is either xed and known or random and unknown are onsidered separately in what follows. 1 Fixed and Known Initial State The prior distribution in (14) is onjugate to the likelihood fun tion in (5) for the AR oeÆients and the ex itation varian e, and onsequently allows the posterior onditional for the AR parameters, whi h follows from Bayes' rule as p k; ak; 2 k Æ2; ;x1:T / p x1:T j k; ak; 2 k p ak; 2 k k; Æ2 p (kj ) ; (29) to be integrated over these parameters to obtain an expression for p kj Æ2; ;x1:T . By performing this integration the posterior onditional an be fa torised as p k; ak; 2 k Æ2; ;x1:T = p akj k; 2 k; Æ2;x1:T p 2 k k; Æ2;x1:T p kj Æ2; ;x1:T ; (30) 11 where the distributions on the right hand side are given by p akj k; 2 k; Æ2;x1:T = N ak ; bak; 2 kMk (31) p 2 k k; Æ2;x1:T = IG 2 k ; 0 + T=2; k (32) p kj Æ2; ;x1:T / k ( 0+T=2) k IK (k) (Æ2)k=2 k! jMkj j ak j 1=2 = k ( 0+T=2) k IK (k) (1 + Æ2)k=2 k! ; (33) with Mk , XTkXk + 1 Æ2 1 ak 1 = Æ2 1 + Æ2 XTkXk 1 (34) bak ,MkXTkx1:T (35) k , 0 + 12xT1:TPkx1:T (36) Pk , IT XkMkXTk: (37) The simpli ations in (33) and (34) follow dire tly from the de nition of ak in Se tion III. Simulating from the posterior distribution in (30) requires simulation from ea h of the distributions on the right hand side. This is straightforward for the posterior onditionals of the AR oeÆ ients and ex itation varian e, sin e these distributions are of standard forms. The posterior onditional for the model order an be sampled from in a reversible jump MH step. The proposal for this step is here taken to be a dis retised Lapla ian distribution, i.e. q (k0j k) , sk exp ( jk0 kj) ; (38) where > 0 is a s ale parameter, and the normalising onstant is given by s 1 k = X k02K exp ( jk0 kj) : (39) Su h a proposal was used before in e.g. [24℄, and ensures that most of the proposed model moves are small, without eliminating the possibility of the o asional large one. The orresponding MH a eptan e probability is given by u (k0j k) = min f1; ru (k0j k)g ; (40) where the a eptan e ratio is dedu ed from the generi expression (see [10℄ for details) r , (posterior ratio) (proposal ratio) (Ja obian); (41) 12 whi h yields ru (k0j k) = sk0 (k0 k)k! sk (Æ2)(k0 k)=2 k0! k k0 0+T=2 jMk0 j j ak j jMkj ak0 !1=2 = sk0 (k0 k)k! sk (1 + Æ2)(k0 k)=2 k0! k k0 0+T=2 ; (42) sin e the Ja obian is unity. The simpli ation in (42) again follows from the de nition of ak . Note that the a eptan e ratio be omes one if k0 = k. To summarise, the sampling step for the AR parameters at iteration i is presented below. Algorithm 2 (Sampling the AR Parameters: Un onstrained, Initial State Known) 1. Propose a new value for the model order k0 q kj k(i 1) (see (38)). 2. If u U[0;1℄ u k0j k(i 1) (see (40)): Set k(i) = k0. Else: Set k(i) = k(i 1). 3. Simulate 2(i) k(i) p 2 k k(i); Æ2(i 1);x1:T (see (32)). 3. Simulate a(i) k(i) p ak j k(i); 2(i) k(i) ; Æ2(i 1);x1:T (see (31)). 2 Random and Unknown Initial State In this ase onjugate analysis again allows integration over the AR oeÆ ients and ex itation varian e, so that the posterior onditional for the AR parameters an be fa torised as p k; ak; 2 k;x0;k Æ2; ; 2;x1:T = p akj k; 2 k; Æ2;x0;k;x1:T p 2 k k; Æ2;x0;k; 2;x1:T p k;x0;kj Æ2; ; 2;x1:T ; (43) 13 where p ak j k; 2 k; Æ2;x0;k;x1:T is given by (31), and the remaining distributions on the right hand side follow as p 2 k k; Æ2;x0;k; 2;x1:T = IG 2 k; k; k (44) p k;x0;kj Æ2; ; 2;x1:T / ( k) k k k IK (k) (2 Æ2 2)k=2 k! jMkj j ak j x0;k !1=2 = ( k) k k k IK (k) (2 2 (1 + Æ2))k=2 k! ; (45) with k , 0 + 12 (T + k) (46) k , 0 + 1 2 xT1:TPkx1:T + 1 2xT0;k 1 x0;kx0;k = 0 + 1 2 xT1:TPkx1:T + 1 2xT0;kx0;k : (47) The simpli ations in the above follow dire tly from the de nitions of ak and x0;k in Se tion III.Simulating from the posterior distribution in (43) requires simulation from ea h of the distributions on the right hand side. As before, this is straightforward for the posterior onditionals of the AR oeÆ ients and ex itation varian e. Be ause of the diÆ ulty asso iated with onstru ting eÆ ient proposal distributions for the initial state for arbitrary model moves, the joint posterior onditional for the model order and initial state is here sampled from using a reversible jump MCMC step employing only three model moves, viz. 1. A birth move, sele ted with probability bk, in whi h the dimension is in reased from k to k + 1. 2. A death move, sele ted with probability dk, in whi h the dimension is de reased from k to k 1. 3. An update move, sele ted with probability uk, in whi h the dimension remains xed to k. Thus, the reversible jump MCMC step is a mixture of the moves des ribed above, i.e. at ea h iteration one of the andidate moves, viz. birth, death and update, is randomly hosen with orresponding probabilities bk, dk and uk, su h that bk + dk + uk = 1, for all k 2 K. For k = 0 the death move is impossible, so that d0 = 0, whereas for k = kmax the birth move is impossible, 14 so that bkmax = 0. For all other values of k the birth and death probabilities are taken to be bk , min 1; p (k + 1j ) p (kj ) (48) dk+1 , min 1; p (kj ) p (k + 1j ) ; (49) where is a onstant whi h tunes the proportion of dimension hange moves vs. update moves. As pointed out in [10℄, this hoi e ensures that bkp( kj ) dk+1p(k+1j ) = 1, whi h means that a MH algorithm on the dimension alone would have a unity a eptan e probability. Ea h of the moves is now onsidered in more detail. For the update move the model order remains xed, so that the posterior onditional in (45) redu es to p x0;kj k; Æ2; ; 2;x1:T / k k jMkj j ak j x0;k !1=2 : (50) Assuming the model order to be xed to k, a valid strategy to simulate from this distribution at iteration i is summarised below. Algorithm 3 (Update Move: Un onstrained, Initial State Unknown) 1. Set x(0) 0;k = x(i 1) 0;k . 2. For j = 1; : : : ; k: If u U[0;1℄ u: Simulate x(j) 0;k in a MH step with (50) as invariant distribution and q1;j x0;kjx(j 1) 0;k as proposal distribution. Else: Simulate x(j) 0;k in a MH step with (50) as invariant distribution and q2;j x0;kjx(j 1) 0;k as proposal distribution. 3. Set x(i) 0;k = x(k) 0;k. The proposal distributions in the above algorithm are de ned as q1;j x00;k x0;k , Æxj;0;k dx0j;0;k q1 x0 j+1 x0j;0;k (51) q2;j x00;k x0;k , Æxj;0;k dx0j;0;k q2 x0 j+1 x j+1 ; (52) 15 where xj;0;k is x0;k with x j+1 removed. Thus, ea h element of the initial state is sampled individually using a mixture of two MH steps with proposal distributions as de ned above, and mixture weights u and 1 u, 0 u 1, respe tively. The proposal distribution for the rst MH step in (51) is onstru ted by making use of the reversible hara ter of the AR pro ess, meaning that the initial state values an be simulated a ording to xt = k Xi=1 ai;kxt+i + ket; et iid N (0; 1) ; t = 0; 1; : : : ; k + 1; (53) should the AR oeÆ ients and ex itation varian e be known. This is not the ase, so instead these parameters are substituted with their ML estimates, obtained by assuming the initial state to be zero, i.e. eaML k , e XTk e Xk 1 e XTkx1:T (54) e 2ML k , 1 T xT1:T e Pkx1:T ; (55) with e Pk , IT e Xk e XTk e Xk 1 e XTk. The matrix e Xk is identi al to Xk in (4), but with the initial state values set to zero. The implied joint proposal distribution for all the elements of the initial state now follows as q1 x00;k , N x00;k;m0;k; e 2ML k M0;k ; (56) with M0;k , e AT0;k e A0;k 1 (57) m0;k , M0;k e AT0;k e A1;kxk:1; (58) where e Ak , e A1;k e A0;k 2 Rk 2k is of the same form as Ak in (4), but onstru ted with the ML estimate of the AR oeÆ ients in (54). The matri es e A1;k and e A0;k are respe tively the rst k and nal k olumns of e Ak. The required onditional proposal distributions q1 x0 j+1 x0j;0;k , j = 1; : : : ; k, are easily obtained from (56), sin e it is Gaussian. The proposal distribution for the se ond MH step in (52) is taken to be a Gaussian random walk, i.e. q2 x0 j+1 x j+1 , N x0 j+1;x j+1; 2 RW ; (59) 16 where 2 RW is the varian e of the random walk. This proposal distribution is introdu ed to perform lo al exploration of the posterior distribution. For both MH steps the orresponding a eptan e probability be omes i;j x00;k x0;k = min 1; ri;j x00;k x0;k ; i = 1; 2; (60) with the a eptan e ratio following from (41) as ri;j x00;k x0;k = k 0 k k 0 jM0kj j ak j x0;k jMkj 0ak 0x0;k 1A1=2 qi;j x0;kjx00;k qi;j x00;k x0;k = k 0 k k qi;j x0;kjx00;k qi;j x00;k x0;k ; (61) where 0 k, M0k, 0ak and 0x0;k are similar to k, Mk, ak and x0;k with x0;k repla ed by x00;k, and the simpli ation again follows from the de nitions of ak and x0;k . Several other proposal distributions may be onstru ted, but the ombination presented here proved to be very eÆ ient in simulations. Numerous sampling strategies an be devised for the birth and death moves. The one presented below makes use of the proposal distributions developed for the update move, and was found to work well in pra ti e. Assuming that the urrent state of the Markov hain is in fkg k, the birth move at iteration i is summarised below. Algorithm 4 (Birth Move: Un onstrained, Initial State Unknown) 1. Set xk+1;0;k+1 = x(i 1) 0 ; : : : ; x(i 1) k+1 . 2. Propose a value for the new element of the initial state x k q1 (x k jxk+1;0;k+1) (see (51)) and set x0;k+1 = (xk+1;0;k+1; x k). 3. If u U[0;1℄ birth k + 1;x0;k+1j k;x(i 1) 0;k (see (62)): Set k(i) = k + 1, x(i) 0;k+1 = x0;k+1. Else: Set k(i) = k, x(i) 0;k = x(i 1) 0;k . 17 Similarly, assuming that the urrent state of the Markov hain is in fk + 1g k+1, the death move at iteration i is summarised below. Algorithm 5 (Death Move: Un onstrained, Initial State Unknown) 1. Set x0;k = x(i 1) 0 ; : : : ; x(i 1) k+1 . 2. If u U[0;1℄ death k;x0;kj k + 1;x(i 1) 0;k+1 (see (63)): Set k(i) = k, x(i) 0;k = x0;k . Else: Set k(i) = k + 1, x(i) 0;k+1 = x(i 1) 0;k+1. For both the birth and death moves the model order and initial state are sampled in a MH step that admits (45) as invariant distribution. For the birth move a andidate for the initial state is obtained by setting the rst k elements to the orresponding elements of the urrent initial state, and proposing a andidate for the additional element from q1 (x kjxk+1;0;k+1), onditional on the urrent values of the other elements. Similarly, a andidate for the initial state for the death move is obtained as the rst k elements of the urrent initial state. The orresponding MH a eptan e probabilities are given by birth (k + 1;x0;k+1j k;x0;k) = minf1; rbirth (k + 1;x0;k+1j k;x0;k)g (62) death (k;x0;kj k + 1;x0;k+1) = min 1; r 1 birth (k + 1;x0;k+1j k;x0;k) ; (63) with the a eptan e ratio following from the expression in (41) as rbirth (k + 1;x0;k+1j k;x0;k) = ( k+1) k k (2 Æ2 2)1=2 ( k) k+1 k+1 jMk+1j j ak j x0;k jMkj ak+1 x0;k+1 !1=2 1 q1 (x kjxk+1;0;k+1) = ( k+1) k k (2 2 (1 + Æ2))1=2 ( k) k+1 k+1 1 q1 (x kjxk+1;0;k+1) ; (64) where the simpli ation again follows from the de nitions of ak and x0;k . To summarise, the sampling step for the AR parameters at iteration i is presented below. 18 Algorithm 6 (Sampling the AR Parameters: Un onstrained, Initial State Unknown) 1. If u U[0;1℄ bk(i 1) : Perform birth move (see Algorithm 4). Else if u bk(i 1) + dk(i 1) : Perform death move (see Algorithm 5). Else: Perform update move (see Algorithm 3). 2. Simulate 2(i) k(i) p 2 k k(i); Æ2(i 1);x(i) 0;k(i) ; 2(i 1);x1:T (see (44)). 3. Simulate a(i) k(i) p ak j k(i); 2(i) k(i) ; Æ2(i 1);x(i) 0;k(i) ;x1:T (see (31)). C Constrained Case Re all that for the onstrained ase k = k and k = ( 1; 1)k. For the sake of brevity only the ase where the initial state is xed and known is onsidered in what follows. A strategy similar to the one in Se tion IV-B.2 an be adopted if the initial state is unknown. The posterior onditional for the AR parameters follows from Bayes' rule as p k; k; 2 k Æ2; ;x1:T / p x1:T j k; ak = L ( k) ; 2 k p k; 2 k k; Æ2 p (kj ) : (65) Sampling from this distribution using a reversible jump MCMC step similar to those developed for the un onstrained ase is made diÆ ult by the non-linear nature of the transformation between the AR and re e tion oeÆ ients. However, if the AR oeÆ ients ak = L ( k) in the likelihood fun tion above is repla ed by the approximation ak = Uk k introdu ed in Se tion II, a reversible jump MCMC strategy an be developed for the resulting approximate posterior distribution. The Markov hain transition kernel for this step an then be used as a proposal for the true posterior distribution in (65). The details of this approa h is presented in what follows. The approximate posterior distribution, de ned as p0 k; k; 2 k Æ2; ;x1:T / p x1:T j k; ak = Uk k; 2 k p k; 2 k k; Æ2 p (kj ) : (66) 19 an be marginalised over the re e tion oeÆ ients, and an thus be fa torised as p0 k; k; 2 k Æ2; ;x1:T = p0 kj k; 2 k; Æ2;x1:T p0 k; 2 k Æ2; ;x1:T ; (67) where the distributions on the right hand side are given by p0 kj k; 2 k; Æ2;x1:T = kN k; b k; 2 kQk I( 1;1)k ( k) (68) p0 k; 2 k Æ2; ;x1:T / k k ( 0+T=2) k IK (k) k (Æ2)k=2 k! jQkj k !1=2 IG 2 k; 0 + T=2; k = k k ( 0+T=2) k IK (k) k (1 + Æ2)k=2 k! IG 2 k; 0 + T=2; k ; (69) with Qk , 2 k + 1 Æ2 1 k 1 = Æ2 1 + Æ2 2k , diag q2 1;k; : : : ; q2 k;k (70) b k , QkUTkXTkx1:T , (b 1; : : : ; b k) (71) k , 0 + 1 2xT1:TPkx1:T (72) Pk , IT XkUkQkUTkXTk = IT Æ2 1 + Æ2 Xk XTkXk 1XTk (73) 1 k , 2 k k Yi=10 erf0 1 b i q2 2 kq2 i;k1A+ erf0 1 + b i q2 2 kq2 i;k1A1A : (74) The simpli ations in the above expressions follow dire tly from the de nition of k in Se tion III. Note that the normalising onstants k and k are omplex non-linear fun tions of 2 k and Æ2, making further integration of (69) over the ex itation varian e impossible. The approximate posterior distribution in (67) an be sampled from using a reversible jump MH step with a proposal distribution of the form q0 k0; 0k0 ; 20 k0 k; k; 2 k , q0 (k0j k) q0 20 k0 k0 q0 0k0 j k0; 20 k0 ; (75) where q0 (k0j k) an be taken to be (38) or any other onvenient model move proposal distribution, and q0 2 k k , IG 2 k ; 0 + T=2; k (76) q0 kj k; 2 k , p0 kj k; 2 k; Æ2;x1:T : (77) The proposal distribution for the re e tion oeÆ ients is a trun ated Gaussian distribution, and an be sampled from using an a ept/reje t pro edure that simulates from the untrun ated distribution until a sample that falls within the stable region is obtained. Note that sin e the matrix 20 Qk is diagonal, this sampling step an be performed for ea h re e tion oeÆ ient independently, utilising the orresponding marginal distributions of (68). For the proposal distribution in (75) the orresponding MH a eptan e probability be omes 0 k0; 20 k0 k = minn1; r0 k0; 20 k0 k o ; (78) with the a eptan e ratio following from the expression in (41) as r0 k0; 20 k0 k = k0 k (k0 k)k! k k0 (Æ2)(k0 k)=2 k0! k k0 0+T=2 jQk0 j k jQkj k0 !1=2 q0 (kj k0) q0 (k0j k) = k0 k (k0 k)k! k k0 (1 + Æ2)(k0 k)=2 k0! k k0 0+T=2 q0 (kj k0) q0 (k0j k) ; (79) where the simpli ation again follows from the de nition of k . Note that the a eptan e probability in (78) is independent of the re e tion oeÆ ients, so that these quantities only need to be sampled on e a proposed model move is a epted. The sampling step for the approximate posterior distribution des ribed above de nes a Markov hain transition kernel K0 k0; 0k0 ; 20 k0 k; k; 2 k that has p0 k; k; 2 k Æ2; ;x1:T in (67) as invariant distribution, and is reversible, i.e. Z p0 k; d k; d 2 k Æ2; ;x1:T K0 k0; d 0k0 ; d 20 k0 k; k; 2 k = Z p0 k0; d 0k0 ; d 20 k0 Æ2; ;x1:T K0 k; d k; d 2 k k0; 0k0 ; 20 k0 : (80) This kernel an be used as a proposal to sample from the true posterior distribution in (65) in a MH step, i.e. q k0; d 0k0 ; d 20 k0 k; k; 2 k = K0 k0; d 0k0 ; d 20 k0 k; k; 2 k : (81) The orresponding a eptan e probability for this MH step be omes k0; 0k0 ; 20 k0 k; k; 2 k = minn1; r k0; 0k0 ; 20 k0 k; k; 2 k o ; (82) with r k0; 0k0 ; 20 k0 k; k; 2 k = LLR k0; 0k0 ; 20 k0 LLR (k; k; 2 k) ; (83) where LLR k; k; 2 k is the ratio of the true and approximate likelihood fun tions, i.e. LLR k; k; 2 k , p x1:T j k; ak = L ( k) ; 2 k p (x1:T j k; ak = Uk k; 2 k) : (84) 21 Remark 3 The method des ribed above uses the Markov hain transition kernel for the approximate distribution in (66) as a proposal to sample from the true distribution in (65) in a MH step. Using this methodology many di erent hoi es are possible for the approximate distribution and the orresponding sampling strategy, as long as the resulting Markov hain transition kernel is reversible, and the a eptan e probability is orre tly spe i ed. For example, alternative formulations of Uk may be onsidered, based on e.g. the urrent estimate of the re e tion oeÆ ients. The hoi es adopted here were found to give good results in pra ti e. To summarise, the sampling step for the AR parameters at iteration i is presented below. Algorithm 7 (Sampling the AR Parameters: Constrained) 1. Propose a new value for the model order k0 q0 kj k(i 1) . 2. Propose a new value for the ex itation varian e 20 k0 q0 2 k k0 (see (76)). 3. If u U[0;1℄ 0 k0; 20 k0 k(i 1) (see (78)): Simulate 0k0 q0 kj k0; 20 k0 (see (77)). Else: Set k0 = k(i 1), 20 k0 = 2(i 1) k(i 1) , 0k0 = (i 1) k(i 1) . 4. If u U[0;1℄ k0; 0k0 ; 20 k0 k(i 1); 2(i 1) k(i 1) ; (i 1) k(i 1) (see (82)): Set k(i) = k0, 2(i) k(i) = 20 k0 , (i) k(i) = 0k0 . Else: Set k(i) = k(i 1), 2(i) k(i) = 2(i 1) k(i 1) , (i) k(i) = (i 1) k(i 1) . Remark 4 In experimental evaluations the reje tion rate for the se ond MH step, given that the proposal for the rst MH step is a epted2, ranged between 30% and 50%, depending on the data and the values hosen for the xed parameters of the algorithm, showing this to be an eÆ ient sampling strategy. 2The se ond MH step is always a epted if the rst is reje ted. 22 D Sampling the Hyper-parameters The posterior onditionals for the hyper-parameters Æ2; ; 2 follow straightforwardly from Bayes' rule, and are given by p Æ2 k; k; 2 k / kIG Æ2; Æ2 + k=2; Æ2 + 1 2 2 k Tk 1 k k (85) p ( j k) / Ga ( ; + k; ) (86) p 2 k; 2 k;x0;k = IG 2; 2 + k=2; 2 + 1 2 2 kxT0;k 1 x0;kx0;k : (87) In the un onstrained ase k = 1, and the proportionality in (85) be omes an equality. Sampling for Æ2 is then by standard pro edures. In the onstrained ase, however, k is a omplex non-linear fun tion of Æ2. Sampling for Æ2 an then be performed in a MH step with proposal distribution q Æ2 , IG Æ2; Æ2 + k=2; Æ2 + 1 2 2 k Tk 1 k k and orresponding MH a eptan e probability Æ2 Æ20 Æ2 = minn1; 0k ko, where 0k is similar to k with Æ2 repla ed by Æ20. For large values of kmax the posterior onditional for approximately be omes p ( j k) Ga ( ; + k; + 1) : (88) Thus, is here sampled using a mixture of two MH steps with proposal distributions q1 ( j k) , Ga ( ; + k; ) (89) q2 ( j k) , Ga ( ; + k; + 1) ; (90) and mixture weights and 1 , 0 < < 1, respe tively. The rst proposal is ne essary to ensure the geometri ergodi ity of the Markov hain (see Se tion IV-E), whereas the se ond provides a good approximation to the true posterior onditional in (86). For both these proposals the orresponding MH a eptan e probability be omes ;i ( 0j ) = min 1; p( 0jk)qi( ) p( jk)qi( 0) , i = 1; 2.Finally, the posterior onditional for 2 is an inverted-gammadistribution, sampling from whi h is by standard pro edures. E Convergen e Results The onvergen e results established in this se tion are for the un onstrained ase, with the initial state assumed to be zero. Similar results an be obtained for the other ases of interest, but are 23 omitted here for the sake of brevity. It is easy to prove that the algorithm onverges, i.e. that the Markov hain nk(i); a(i) k(i) ; 2(i) k(i) ; Æ2(i); (i) : i 2 No is ergodi . A stronger result is proved here by showing that nk(i); a(i) k(i) ; 2(i) k(i) ; Æ2(i); (i) : i 2 No onverges geometri ally to the required posterior distribution, with a rate depending on the starting point. The following main result holds true. Theorem 1 (Geometri Ergodi ity) Let nk(i); Æ2(i); (i) : i 2 No be the Markov hain whose transition Kernel has been des ribed in Se tions IV-B.1 and IV-D, and assume that x1:T = 2 spann[Xk℄1:T;j : j = 1; : : : ; kmaxo. Then the Markov hain onverges at a geometri rate to the probability distribution p k; Æ2; x1:T , i.e. for p k; Æ2; x1:T -almost all initial states k(0); Æ2(0); (0) 2 K R+ R+ there exist onstants Ck(0) ;Æ2(0); (0) > 0 and 2 [0; 1), su h that p(i) k; Æ2; p k; Æ2; x1:T TV Ck(0);Æ2(0); (0) bi=kmax ; (91) where p(i) k; Æ2; is the distribution of k(i); Æ2(i); (i) , and k kTV denotes the total variation norm [23℄. Proof. See Appendix B. Corollary 1 The distribution of the series nk(i); a(i) k(i) ; 2(i) k(i) ; Æ2(i); (i) : i 2 No onverges geometri ally towards p k; ak; 2 k; Æ2; x1:T at the same rate . In other words, independent of the starting point of the Markov hain, the distribution of the Markov hain onverges at least at a geometri rate to the required equilibrium distribution p k; ak; 2 k; Æ2; x1:T . Remark 5 It is not possible to evaluate the onstants Ck(0) ;Æ2(0); (0) and in pra ti e, but Theorem 1 proves their existen e. Some re ent work to establish quantitative bounds on the onvergen e rate for a variety of Markov hain samplers is reported in [19, 20℄. The geometri ergodi ity of the hain ensures that a entral limit theorem (CLT) is valid for the ergodi averages. 24 V Simulation Studies A E e t of the Fixed Parameters of the Prior Distributions Re all from Se tion III that the Bayesian models are spe i ed by the xed parameters 0; 0; Æ2 ; Æ2 ; ; ; 2 ; 2 . The purpose of this experiment is to determine suitable values for these parameters and to investigate the sensitivity of the estimation results to hanges in their values. In all the experiments the parameters of the inverted-gamma prior on the ex itation varian e are set to 0 = 0 = 0, so that p 2 k / 1= 2 k is the uninformative Je reys' prior. An uninformative prior is also assigned to by setting = "1 + 12 , = "2, with "1; "2 1. More spe i ally, for the results reported here these values were set to "1 = 0:001, "2 = 0:0001. Sin e Æ2 and 2 are s ale parameters, they are assigned slightly informative priors by setting Æ2 = 2 = 2 and hoosing Æ2 ; 2 > 0. This hoi e for Æ2 and 2 ensures an in nite varian e, so that the ru ial parameters a e ting the priors are Æ2 and 23. Here the weak in uen e of these parameters on the marginal posterior distributions p Æ2 x1:T , p 2 x1:T and p (kjx1:T ) is experimentally demonstrated. For the data T = 100 samples were simulated from a sixth order AR pro ess with roots at 0:99\ 0:1 , 0:9\ 0:3 and 0:85\ 0:7 , and ex itation varian e 2 6 = 10. The initial state was assumed to be xed and known and the sampler was run without enfor ing stability for a total of Nb +Ns = 500 + 5000 iterations, with the xed parameters of the algorithm set to kmax = 30, = 0:25 and = 0:1. Nb is the burn-in4 period of the Markov hain, and samples generated during this period are dis arded in subsequent posterior analyses. The algorithm was initialised with a zero model order and random values for the other parameters. The estimates of p Æ2 x1:T and p (kjx1:T ) for Æ2 = 1; 10; 100 are depi ted in Figure 1, and show that the estimation results are stable over a wide range of hoi es for Æ2 . Similar results were obtained for 2 in the ase where the initial state is assumed to be unknown, using = 0:5, u = 0:5 and 2 RW = 0:1 for the additional xed parameters of the algorithm. 3Re all that the mean of IG ( ; ) is 1 . 4The burn-in period is an estimate of the number of sampling iterations before subsequent samples from the hain an be assumed to be representative of the desired posterior distribution. A dis ussion of onvergen e diagnosti s falls outside the s ope of this paper, but an overview an be found in e.g. [14℄. 25 Figure 1 about here B Dete tion In this experiment the model order dete tion performan e of the methods developed in this paper is ompared to that of standard riteria, su h as AIC, BIC and MDL. The model sele tion rule for the standard riteria is of the form bk = argmax k2K n logp x1:T j k; b ML k + P (k; T )o : (92) Thus, it entails the maximisation over k 2 K of a utility fun tion that is a summation of a data term (the negative of the log-likelihood fun tion evaluated at the ML estimate of the parameters) that de reases with an in rease in the model order, and a penalty term P (k; T ) that in reases with the model order. These riteria are all motivated by di erent arguments. AIC is based on expe ted information, BIC is an asymptoti Bayes fa tor, and MDL involves evaluating the minimum information required to transmit some data and a model whi h des ribes the data over a ommuni ations hannel. For an un onstrained AR pro ess for whi h the initial state is xed and known, the model sele tion rules orresponding to AIC, BIC and MDL an be evaluated analyti ally5, and are given by (see [7℄ for details)bkAIC = argmax k2K T2 xT1:TRkx1:T + k (93) bkBIC = bkMDL = argmax k2K T2 xT1:TRkx1:T + k2 logT ; (94) with Rk , IT Xk (XTkXk)XTk. Thus, for AR pro esses BIC and MDL yield the same model sele tion rule, whi h is asymptoti ally onsistent. On the other hand, AIC is not asymptoti ally onsistent (see [11℄ for details). For the omparison 100 realisations of T samples ea h were simulated from a third order AR pro ess with roots at 0.9 and 0:5\ 0:85 , and ex itation varian e 2 3 = 10. For ea h realisation bkAIC, bkBIC and bkMMAP were evaluated and stored. The initial state was assumed to be xed and known, and stability was not enfor ed. The MCMC sampler to ompute bkMMAP was run for a total number of Nb + Ns = 500 + 5000 iterations, using values for the xed parameters of the 5If the initial state is unknown, the ML estimate of the parameters an still be found, as dis ussed in e.g. [13℄. 26 algorithm and the prior distribution similar to those in Se tion V-A. To investigate the e e t of the size of the data set, the experiment was performed for a wide range of values for T . The results are summarised in Table 4, and show that the Bayesian method onsistently outperforms the standard riteria. The dete tion a ura y for the Bayesian method and BIC/MDL approa hes 100% as the data sets be ome large, on rming that these strategies are indeed asymptoti ally onsistent. This is not the ase for AIC, for whi h the dete tion a ura y stabilises at around 75%. The Bayesian method also gives more information than the lassi al te hniques in the sense that the samples from the Markov hain an be used to onstru t empiri al approximations to p (k; kjx1:T ) and its marginals for all k 2 K. Table 4 about here C Constrained Case In this experiment the onstrained and un onstrained estimation strategies are ompared. For the results reported here the initial state was assumed to be xed and known, and the xed parameters of the algorithm and prior distributions were set to values similar to those in Se tion V-A. For the data a realisation of T = 100 samples was simulated from the AR pro ess spe i ed in Se tion V-A, su h that the ML estimates of the poles for this realisation lie at 1:0026\ 0:1004 , 0:9046\ 0:2965 and 0:8431\ 0:6972 . The sampler was run on this realisation for a total number of Nb +Ns = 500 + 5000 iterations for both the onstrained and un onstrained ases. The estimates of p (kjx1:T ) for both ases are depi ted in Figure 2, and indi ate that both algorithms orre tly estimated the true model order (note that the histograms are quite di erent, though). Approximations to the MMSE estimates of the poles were also omputed for both ases using the expression in (27), and are given by 0:9881\ 0:10059 , 0:8642\ 0:3063 and 0:7822\ 0:6846 for the onstrained ase, and 1:0057\ 0:1002 , 0:8956\ 0:3039 and 0:8471\ 0:6906 for the un onstrained ase. Figure 2 about here 27 D Southern Os illation Index (SOI) Data In this experiment the SOI data is analysed. The El Ni~ no-Southern Os illation is a limatologi al phenomenon that has been of some interest in limate hange studies in re ent de ades. The SOI series, depi ted in the top graph of Figure 3, onsists of 540 observations, omputed as the di eren e of the departure from the long-term monthly mean sea level pressures at Tahiti in the South Pa i and Darwin in Northen Australia. The monthly index spans the period from 1950 to 1995, and is related to sea surfa e temperatures. Figure 3 about here This data set was analysed using AR pro esses within a Bayesian setting before in [12℄. The Bayesian model adopted there spe i es a prior stru ture dire tly on the roots of the AR hara teristi polynomial. This prior stru ture pla es upper bounds on the maximum number of real roots and omplex onjugate root pairs, and expli itly allows for zero and unit roots. Stability is enfor ed by setting the prior to be zero for roots with moduli greater than one. Model un ertainty is impli itly a ounted for as a onsequen e of allowing the roots to have zero moduli. The model and algorithms presented in this paper di er from those in [12℄ in a number of important ways. First, the prior distribution in (16) allows for arbitrary ombinations of real roots and omplex onjugate root pairs. The form of the Bayesian model adopted here also fa ilitates joint sampling strategies that lead to fast onvergen e. Furthermore, model order un ertainty is a ounted for expli itly in the prior stru ture, fa ilitating the design of eÆ ient moves to explore the joint model parameter spa e. Finally, as illustrated by the results in Se tion V-A, the estimation performan e is robust to a wide range of hoi es for the xed parameters of the prior distribution. Figure 4 about here The results presented in [12℄ favour model orders ranging from 8 to 17 with 3 to 5 omplex omponents. The mode of the posterior distribution of the model order is at 11. However, in the de omposition of the time series only one latent omponent was found to be learly dominant, suggesting that the data may be des ribed by a more parsimoniousmodel. This is indeed on rmed by the results of the analysis performed here, whi h favour a model order of 3 (the order hosen 28 by BIC is 2), as depi ted by the estimate of p (kjx1:T ) in Figure 4. These results were obtained by running the sampler for a total number of Nb +Ns = 500+ 5000 iterations. The initial state was assumed to be unknown, and stability was not enfor ed. Values similar to those in Se tion V-A were used for the xed parameters of the algorithm and the prior distribution, ex ept for kmax, whi h was set to 40, as is the ase in [12℄. Similar results were obtained when expli itly enfor ing stability. The approximate MMSE roots for the model with k = 3 were al ulated a ording to the expression in (27), and yielded one real root with modulus 0.8361 and one omplex onjugate root pair at 0:3622\ 0:6838 . The orresponding latent omponents are depi ted in the bottom and middle graphs of Figure 3, and on rm the fa t that there is only one learly dominant latent omponent. E Spee h Data In this nal experiment the algorithms developed in this paper are applied to spee h data to investigate the importan e of in orporating model un ertainty within the ontext of spee h modelling using AR pro esses. The spee h data analysed here is depi ted in the top graph of Figure 5, and is an utteran e of the senten e \Good servi e should be rewarded by big tips." by a male Ameri an speaker, taken from the TIMIT database. Figure 5 about here Following from the quasi-stationary assumption of spee h, where the signal is assumed to be stationary over short periods of time, the data in Figure 5 was pro essed in frames of 300 samples (18.75 mse .), overlapping by 100 samples. For ea h frame the sampler was run for a total number of Nb + Ns = 500 + 5000 iterations, using values similar to those in Se tion V-A for the xed parameters of the algorithm and prior distribution. Stability was expli itly enfor ed and the initial state was assumed to be xed and known. The bottom graph of Figure 5 depi ts the MMAP estimated model order for the frames omprising the spee h signal in the top graph, and exhibits a high degree of variation over the utteran e. The median for the MMAP model order is 7, whi h is mu h lower than the values normally used in xed-order spee h pro essing appli ations (10 and 12 are ommon hoi es). A further bene t of 29 the methods presented here is that they should redu e the musi al artifa ts asso iated with overtting when modelling spee h using xed-order AR pro esses, espe ially in spee h enhan ement appli ations where the spee h signal is observed in noise. VI Con lusions In this paper the model order determination problem for AR pro esses was posed within a Bayesian framework. Original hierar hi al prior models were proposed that allow the stability of the model to be enfor ed and a ount for a possible unknown initial state. These models were shown to be robust for a wide range of values for their xed hyper-parameters. Sto hasti reversible jump MCMC algorithms were developed to impli itly perform the integration required to obtain the marginal posterior model order probabilities. The simulation results presented in Se tion V indiate that the algorithms are a urate and omputationally eÆ ient, and ompare favourably with standard model sele tion riteria. A Notation and Abbreviations A Notation v olumn ve tor, i.e. v , (v1; : : : ; vnv) vi:j sub-ve tor, i.e. vi:j , (vi; : : : ; vj) M matrix [M℄i:j;k:l sub-matrix formed by rows i to j and olumns k to l of M vT, MT ve tor and matrix transposes M 1 inverse of a (square) matrix jMj determinant of a (square) matrix 0m n m n matrix of zeros In n n identity matrix diag (vp:q) diagonal matrix with the elements of vp:q on the diagonal x p (x) x is distributed a ording to p (x) xt iid p (x) fxtg is an i.i.d. sequen e distributed a ording to p (x) Ep(x) [f (x)℄ expe tation of f (x) w.r.t. p (x), i.e. Ep(x) [f (x)℄ = R f (x) p (dx) 30 IS (x) indi ator fun tion for the set S, i.e. IS (x) = 1 () x 2 S and 0 otherwise Æa (dx) Dira delta fun tion, i.e. RA Æa (dx) = 1 () a 2 A and 0 otherwise exp ( ) exponential fun tion log ( ) natural logarithmi fun tion, i.e. to the base e erf ( ) Gaussian error fun tion Rn n-dimensional Eu lidean spa e Distribution Notation1 p ( ) Gaussian N (x; ; ) j2 j 1=2 exp 1 2 (x )T 1 (x ) Inverted-Gamma IG (x; ; ) ( )x ( +1) exp ( =x) Gamma Ga (x; ; ) ( )x 1 exp ( x) Poisson Pn (x; ) exp ( ) x x! B Abbreviations AIC Akaike's information riterion AR autoregressive BIC Bayes' information riterion i.i.d. independently and identi ally distributed MCMC Markov hain Monte Carlo MDL minimum des ription length MH Metropolis-Hastings ML maximum likelihood MMAP marginal maximum a posteriori MMSE minimum mean square error SLLN strong law of large numbers 1An alternative notation omits the variable of interest when no ambiguities arise. 31 B Proof of Convergen e The proof of Theorem 1 in Se tion IV-E relies on the following theorem, whi h is a result of Theorems 14.0.1 and 15.0.1 in [15℄. Theorem 2 Suppose that a Markov transition kernel K on a spa e X 1. is a -irredu ible (for some measure ) aperiodi Markov transition kernel with invariant distribution , and 2. has geometri drift towards a small set C, with drift fun tion V : X ! [1;+1), i.e. there exist some onstants 0 < < 1, b > 0 and k0, and an integrable measure , su h that KV (x) V (x) + bIC (x) (95) K(k0) (dx0jx) IC (x) (dx0) : (96) Then, for -almost all initial points x0, and some onstants < 1 and R < +1 kKn ( jx0) ( )kTV RV (x0) n; (97) i.e. K is -almost everywhere geometri ally ergodi . Several lemmas and propositions need to be proved to establish the onditions required to apply Theorem 2. These are outlined below. 1. Lemmas 1 and 4 establish bounds for k and the ratios p( jk) qi( jk) , i = 1; 2, respe tively. These bounds are required by the proofs of a number of the other lemmas and propositions. 2. Lemmas 2, 3 and 5 establish lo al minorisation onditions for the transition kernels for k, Æ2 and , respe tively, whereas Lemma 6 establishes a minorisation ondition for the transition kernel in the empty on guration (k = 0). These lemmas lead to the proof of Proposition 1, whi h establishes the minorisation ondition in (96) for some k0 and measure (to be des ribed). 3. The -irredu ibility and aperiodi ity of the Markov hain are proved in Corollary 2, leading to the simple onvergen e of the Markov hain. 4. To omplete the proof, Lemma 7 and Proposition 2 establish the drift ondition in (95). 32 Before pro eeding, some notation needs to be introdu ed. Let K k0; dÆ20; d 0 k; Æ2; denote the transition kernel of the Markov hain, i.e. for a xed k; Æ2; 2 K R+ R+ Pr k0; Æ20; 0 2 A k; Æ2; = ZAK k0; dÆ20; d 0 k; Æ2; ; (98) where A 2 fk0g B (R+ R+ ). By onstru tion (see Se tions IV-B.1 and IV-D) this transition kernel is a produ t of transition kernels, given by K k0; dÆ20; d 0 k; Æ2; = Kk (k0j k)KÆ2 dÆ20 Æ2; k0 K (d 0j ; k0) ; (99) where the transition kernels on the right hand side follow as Kk (k0j k) = q (k0j k)min f1; ru (k0j k)g+ Æk (k0) 1 X k 2K q (k j k)minf1; ru (k j k)g! (100) KÆ2 dÆ20 Æ2; k0 = p Æ20 Æ2; k0;x1:T dÆ20 (101) K (d 0j ; k0) = K1; (d 0j ; k0) + (1 )K2; (d 0j ; k0) ; (102) with Ki; (d 0j ; k0) = qi ( 0j k0)min 1; p ( 0j k0) =qi ( 0j k0) p ( j k0) =qi ( j k0) d 0 + Æ (d 0) 1 ZR+ qi ( j k0)min 1; p ( j k0) =qi ( j k0) p ( j k0) =qi ( j k0) d ; i = 1; 2: (103) The sampling pro edures for the model order k, and the hyper-parameters Æ2 and are des ribed in Se tions IV-B.1 and IV-D, respe tively. Lemma 1 Assume that x1:T = 2 spann[Xk℄1:T;j : j = 1; : : : ; kmaxo. Then, for any 0 2 R+ and k 2 K, there exists an "1 > 0, su h that "1 k 0 + 12xT1:Tx1:T : (104) Proof. Let P k denote the matrix Pk for whi h Æ2 ! +1. Then xT1:TP kx1:T > 0 for any k 2 K, sin e x1:T = 2 spann[Xk℄1:T;j : j = 1; : : : ; kmaxo. The result follows by noting that xT1:TPkx1:T = 1 1+Æ2 xT1:Tx1:T + Æ2 1+Æ2 xT1:TP kx1:T . Lemma 2 There exist onstants M1 > 0 and M2 > 0, su h that for any k 2 f1; : : : ; kmaxg and Æ2 2 R+ Kk (k 1j k) 1k M1M2If : 0, su h that ru (k 1j k) = k 1 + Æ2 1=2 sk 1 sk k k 1 0+T=2 k 1 + Æ2 1=2 exp ( ) "1 0 + 1 2xT1:Tx1:T 0+T=2 = k M1 : (107) Thus, there exists a onstant M2 > 0 large enough, su h that for any < M2 k M1M2 ru (k 1j k) 1: (108) The result follows. Lemma 3 There exist a onstant > 0 and a probability density , su h that for any k 2 K and Æ2 2 R+ KÆ2 dÆ20 Æ2; k Æ20 k dÆ20: (109) Proof. From the de nition of the transition kernel in (101) it follows that KÆ2 dÆ20 Æ2; k = p Æ20 Æ2; k;x1:T dÆ20: (110) A ording to the sampling strategy des ribed in Se tion IV-D, the density on the right hand side follows as p Æ20 Æ2; k;x1:T = ZRk R+ p Æ20 k; ak; 2 k p ak; 2 k k; Æ2;x1:T dakd 2 k; (111) withp Æ20 k; ak; 2 k = Æ2 + 1 2 2 k aTk 1 ak ak Æ2+k=2 ( Æ2 + k=2) Æ20 Æ2+k=2+1 exp 1 Æ20 Æ2 + 1 2 2 k aTk 1 ak ak (112) and p ak ; 2 k k; Æ2;x1:T = p ak j k; 2 k; Æ2;x1:T p 2 k k; Æ2;x1:T = 1 j2 2 kMkj1=2 exp 1 2 2 k (ak b ak)TM 1 k (ak b ak) 0+T=2 k ( 0 + T=2) 2 k ( 0+T=2+1) exp k 2 k : (113) 34 After a few algebrai manipulations the integrand in (111) is obtained as p Æ20 k; ak; 2 k p ak; 2 k k; Æ2;x1:T = 0+T=2 k 2 k ( 0+T=2+1) Æ2 + 1 2 2 k aTk 1 ak ak Æ2+k=2 Æ20 ( Æ2+k=2+1) j2 2 kMkj1=2 ( 0 + T=2) ( Æ2 + k=2) exp 1 2 2 k (ak ak)TM 1 k (ak ak) + xT1:TPkx1:T + 2 0 Æ2 Æ20 0+T=2 k 2 k ( 0+T=2+1) Æ2+k=2 Æ2 Æ20 ( Æ2+k=2+1) j2 2 kMkj1=2 ( 0 + T=2) ( Æ2 + k=2) exp 1 2 2 k (ak ak)TM 1 k (ak ak) + xT1:TPkx1:T + 2 0 Æ2 Æ20 ; (114) with Mk , 1 + 1 Æ2 + 1 Æ20 1 XTkXk 1 (115) ak ,MkXTkx1:T (116) Pk , IT XkMkXTk: (117) Integrating the nal expression in (114) over the admissible regions for ak and 2 k leads to p Æ20 Æ2; k;x1:T Mk jMkj!1=2 0+T=2 k Æ2+k=2 Æ2 Æ20 ( Æ2+k=2+1) exp Æ2 Æ20 ( Æ2 + k=2) 0 + 1 2xT1:TPkx1:T 0+T=2 1 + 1 Æ2 1 + 1 Æ2 + 1 Æ20 !k=2 " 0+T=2 1 min k2K n Æ2+k=2 Æ2 o Æ20 ( Æ2+k=2+1) exp Æ2 Æ20 ( Æ2 + kmax=2) 0 + 1 2xT1:Tx1:T 0+T=2 " 0+T=2 1 min k2K n Æ2+k=2 Æ2 o ( Æ2 + kmax=2) 0 + 12xT1:Tx1:T 0+T=2 1 1 + Æ20 kmax=2 Æ20 ( Æ2+1) exp Æ2 Æ20 : (118) The result follows with the obvious de nitions for and . Lemma 4 There exist positive onstants "M2 > 0 and M3 < +1, su h that for any k 2 K and < M26 < +1 "M2 p ( j k) q1 ( j k) M3: (119) 6The result holds for any nite onstant in R+. For onvenien e the onstant is here taken to be M2, as de ned in Lemma 2. 35 In the ase where is unbounded, the lower bound be omes zero, whereas the upper bound remains at M3. The ratio p( jk) q2( jk) is not bounded from above. Proof. The result follows straightforwardly from the de nitions of p ( j k) and qi ( j k), i = 1; 2, in Se tion IV-D. Lemma 5 For any k 2 K K (d 0j ; k) "M2 M3 If 0: 0 0 and a probability density. Proof. From the de nition of the transition kernel in (99) it follows that K 0; dÆ20; d 0 0; Æ2; q (0j 0)minf1; ru (0j 0)g p Æ20 Æ2; 0;x1:T dÆ20 q1 ( 0j 0)min 1; p ( 0j 0) =q1 ( 0j 0) p ( j 0) =q1 ( j 0) d 0 "M2q (0j 0) M3 If 0: 0 0, su h that for all k; Æ2; ; k0; Æ20; 0 2 (K R+ R+ )2K(kmax) k0; dÆ20; d 0 k; Æ2; If : 0. Corollary 2 The transition kernel K is -irredu ible. Sin e p k; dÆ2; d x1:T is an invariant distribution of K, and the Markov hain is -irredu ible, then from [23, Theorem 1 , p. 1758℄, the Markov hain is p k; dÆ2; d x1:T -irredu ible. Aperiodi ity is straightforward to prove. Indeed, there is a non-zero probability of remaining in the empty on guration on e it has been rea hed. Therefore, the Markov hain admits p k; dÆ2; d x1:T as unique equilibrium distribution [23, Theorem 1 , p. 1758℄. This result establishes the simple ergodi ity of the Markov hain. Lemma 7 Let V ( ) , maxf1; g. Then for any k 2 K and > 0 lim !+1 K V ( ) V ( ) = 1 ; (127) where by de nition K V ( ) , ZR+K (d 0j ; k)V ( 0) : (128) Proof. From the de nition of the transition kernel in (102) it follows that K V ( ) V ( ) = K1; V ( ) V ( ) + (1 )K2; V ( ) V ( ) : (129) 37 For i = 1; 2 the de nition of the mixture transition kernels in (103) leads to Ki; V ( ) V ( ) = 1 V ( ) ZR+ qi ( 0j k)min 1; p ( 0j k) =qi ( 0j k) p ( j k) =qi ( j k) V ( 0) d 0 + 1 V ( ) ZR+ Æ (d 0) 1 ZR+ qi ( j k)min 1; p ( j k) =qi ( j k) p ( j k) =qi ( j k) d V ( 0) = 1 V ( ) ZR+ qi ( 0j k)min 1; p ( 0j k) =qi ( 0j k) p ( j k) =qi ( j k) V ( 0) d 0 + ri ( ) ; (130) with ri ( ) , 1 RR+ qi ( 0j k)min 1; p( 0jk)=qi( 0jk) p( jk)=qi( jk) d 0. Sin e the integrals in the above expressions are nite, i.e. RR+ qi ( 0j k)min 1; p( 0jk)=qi( 0jk) p( jk)=qi( jk) V ( 0) d 0 RR+ qi ( 0j k)V ( 0) d 0 < +1, it follows that lim !+1 Ki; V ( ) V ( ) = lim !+1 ri ( ) = 8>>><>>:0 if i = 1 1 if i = 2; (131) where use has been made of Lebesgue's dominated onvergen e theorem to reverse the order of the limit and integral operations, and the bounds established in Lemma 4. The result immediately follows. Proposition 2 Let V k; Æ2; = V ( ) , maxf1; g. Then for > 0 lim !+1 KV k; Æ2; V (k; Æ2; ) = 1 < 1; (132) where by de nition KV k; Æ2; , X k02KZR+ R+K k0; dÆ20; d 0 k; Æ2; V k0; Æ20; 0 : (133) Proof. From the de nition of the transition kernel in (99) it follows that KV k; Æ2; = X k02KKk (k0j k) ZR+ p Æ20 Æ2; k0;x1:T dÆ20 ZR+K (d 0j ; k0)V k0; Æ20; 0 = X k02KKk (k0j k)K V k; Æ2; : (134) Dividing by V k; Æ2; and taking the limit result in lim !+1 KV k; Æ2; V (k; Æ2; ) = X k02KKk (k0j k) lim !+1 K V k; Æ2; V (k; Æ2; ) = (1 ) X k02KKk (k0j k) = 1 < 1; (135) 38 where use has been made of Lemma 7. The proof of Theorem 1 is now immediate. By onstru tion the kernelK k0; dÆ20; d 0 k; Æ2; admits p k; dÆ2; d x1:T as invariant distribution. Proposition 1 proved the -irredu ibility and minorisation ondition with k0 = kmax, and Proposition 2 proved the drift ondition. Thus, Theorem 2 applies. C Levinson Re ursions The Levinson re ursions de ne a non-linear invertible mapping between the AR and re e tion oeÆ ients. These re ursions are given by the following matrix equation (see [8℄ for details) 2664 1 i i 1 377526641 a1;i 1 : : : aj;i 1 : : : ai 1;i 1 0 0 ai 1;i 1 : : : ai j;i 1 : : : a1;i 1 13775 = 2664 1 a1;i : : : aj;i : : : ai 1;i ai;i ai;i ai 1;i : : : ai j;i : : : a1;i 1 3775 : (136) Solving the equations, the transformation from k to ak is obtained as the re ursions given in the algorithm below. Algorithm 8 (Levinson Re ursions ak = L ( k)) For i = 1; : : : ; k: ai;i = i For j = 1; : : : ; i 1: aj;i = aj;i 1 iai j;i 1 End End The inverse transformation from ak to k follows in a similar fashion, and is given by the re ursions in the algorithm below. Algorithm 9 (Inverse Levinson Re ursions k = L 1 (ak)) 39 For i = k; : : : ; 1: i = ai;i For j = 1; : : : ; i 1: aj;i 1 = aj;i+ iai j;i 1 2i End End Referen es [1℄ H. Akaike. A new look at the statisti al model identi ation. IEEE Transa tions on Automati Control, AC-19:716{723, 1974. [2℄ C. Andrieu. MCMC Methods for Bayesian Analysis of Nonlinear Parametri Regression Problems. Appli ations to Spe tral Analysis and Impulsive De onvolution. PhD thesis, University Cergy-Pontoise, Paris XV, 1998. In Fren h. [3℄ M. M. Barbieri and A. O'Hagan. A reversible jump MCMC sampler for Bayesian analysis of ARMA time series. Te hni al report, Dipartimento di Statisti a, Probabilit a e Statisti he Appli ate, Universit`a \La Sapienza", Roma and Department of Mathemati s, University of Nottingham, 1996. [4℄ G. Barnett, R. Kohn, and S. Sheather. Bayesian estimation of an autoregressive model using Markov hain Monte Carlo. Journal of E onometri s, 74(2):237{254, 1996. [5℄ G. E. P. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis, Fore asting and Control, Third Edition. Prenti e-Hall, Englewood Cli s, 1994. [6℄ J. R. Deller, J. G. Proakis, and J. H. L. Hansen. Dis rete-Time Pro essing of Spee h Signals. Ma millan Publishing Company, Englewood Cli s, 1993. [7℄ P. M. Djuri . Asymptoti MAP riteria for model sele tion. IEEE Transa tions on Signal Pro essing, 46(10):2726{2735, 1998. 40 [8℄ B. Friedlander. Latti e lters for adaptive pro essing. Pro eedings of the IEEE, 70(8):829{867, 1982. [9℄ E. I. George and D. P. Foster. Calibration and empiri al Bayes variable sele tion. Te hni al report, Department of Management S ien e and Information Systems, University of Texas, 1997. [10℄ P. J. Green. Reversible jump Markov hain Monte Carlo omputation and Bayesian model determination. Biometrika, 82(4):711{732, 1995. [11℄ F. Gustafsson and H. Hjalmarsson. Twenty-one ML estimators for model sele tion. Automati a, 31(10):1377{1392, 1995. [12℄ G. Huerta and M. West. Priors and omponent stru tures in autoregressive time series models. Journal of the Royal Statisti al So iety, Series B, 61:881{899, 1999. [13℄ L. T. M Whorter and L. L. S harf. Nonlinear maximum likelihood estimation of autoregressive time series. IEEE Transa tions on Signal Pro essing, 43(12):2909{2919, 1995. [14℄ K. L. Mengersen, C. P. Robert, and C. Guihenneu -Jouyaux. MCMC onvergen e diagnosti s: a revieWWW. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statisti s 6, pages 395{402. Oxford University Press, 1999. [15℄ S. P. Meyn and R. L. Tweedie. Markov Chains and Sto hasti Stability. Springer-Verlag, New York, 1993. [16℄ S. Ri hardson and P. J. Green. On Bayesian analysis of mixtures with an unknown number of omponents. Journal of the Royal Statisti al So iety, Series B, 59(4):731{758, 1997. [17℄ J. Rissanen. Modeling by shortest data des ription. Automati a, 14:465{478, 1978. [18℄ C. P. Robert and G. Casella. Monte Carlo Statisti al Methods. Springer-Verlag, New York, 1999. [19℄ G. O. Roberts and R. Tweedie. Bounds on regeneration times and onvergen e rates for Markov hains. Sto hasti Pro esses and their Appli ations, 80:211{229, 1999. 41 [20℄ J. S. Rosenthal. Minorisation onditions and onvergen e rates for Markov hain Monte Carlo. Journal of the Ameri an Statisti al Asso iation, 90:558{566, 1995. [21℄ G. S hwarz. Estimating the dimension of a model. The Annals of Statisti s, 6(2):461{464, 1985. [22℄ M. Smith and R. Kohn. Nonparametri regression using Bayesian variable sele tion. Journal of E onometri s, 75(2):317{343, 1996. [23℄ L. Tierney. Markov hains for exploring posterior distributions (with dis ussion). The Annals of Statisti s, 22(4):1701{1762, 1994. [24℄ P. T. Troughton and S. J. Godsill. A reversible jump sampler for autoregressive time series. In Pro eedings of the IEEE International Conferen e on A ousti , Spee h and Signal Pro essing, volume 4, pages 2257{2260, 1998. [25℄ A. Zellner. On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In P. Goel and A. Zellner, editors, Bayesian Inferen e and De ision Te hniques, pages 233{243. Elsevier, 1986. T 35 50 75 100 200 300 AIC 20 31 49 59 74 76 BIC/MDL 19 30 46 57 76 94 MMAP 23 33 49 64 78 95 Table 4: Dete tion performan e results (in a ura y per entage). 42 0 50

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structure of Wavelet Covariance Matrices and Bayesian Wavelet Estimation of Autoregressive Moving Average Model with Long Memory Parameter’s

In the process of exploring and recognizing of statistical communities, the analysis of data obtained from these communities is considered essential. One of appropriate methods for data analysis is the structural study of the function fitting by these data. Wavelet transformation is one of the most powerful tool in analysis of these functions and structure of wavelet coefficients are very impor...

متن کامل

Comparison of Neural Network Models, Vector Auto Regression (VAR), Bayesian Vector-Autoregressive (BVAR), Generalized Auto Regressive Conditional Heteroskedasticity (GARCH) Process and Time Series in Forecasting Inflation in ‎Iran‎

‎This paper has two aims. The first is forecasting inflation in Iran using Macroeconomic variables data in Iran (Inflation rate, liquidity, GDP, prices of imported goods and exchange rates) , and the second is comparing the performance of forecasting vector auto regression (VAR), Bayesian Vector-Autoregressive (BVAR), GARCH, time series and neural network models by which Iran's inflation is for...

متن کامل

Variational Bayesian Autoregressive Conditional Heteroskedastic Models

A variational Bayesian autoregressive conditional heteroskedastic (VB-ARCH) model is presented. The ARCH class of models is one of the most popular for economic time series modeling. It assumes that the variance of the time series is an autoregressive process. The variational Bayesian approach results in an approximation to the full posterior distribution over ARCH model parameters, and provide...

متن کامل

Bayesian Analysis of Spatial Probit Models in Wheat Waste Management Adoption

The purpose of this study was to identify factors influencing the adoption of wheat waste management by wheat farmers. The method used in this study using the spatial Probit models and Bayesian model was used to estimate the model. MATLAB software was used in this study. The data of 220 wheat farmers in Khouzestan Province based on random sampling were collected in winter 2016. To calculate Bay...

متن کامل

On the Estimation of Missing Data in Incomplete Databases: Autoregressive Bayesian Networks

Missing data can be estimated by means of interpolation, time series modelling, or exploiting statistically dependent information. The limits of when one approach is preferable to the alternatives have not been explored, but are likely to be a compromise between a signal autoregressive information, availability of future observations, stationary behaviour and the strength of the dependence with...

متن کامل

Forthcoming in the Journal of Econometrics Bayesian Analysis of Long Memory and Persistence using ARFIMA Models

This paper provides a Bayesian analysis of Autoregressive Fractionally Integrated Moving Average (ARFIMA) models. We discuss in detail inference on impulse responses, and show how Bayesian methods can be used to (i) test ARFIMA models against ARIMA alternatives, and (ii) take model uncertainty into account when making inferences on quantities of interest. Our methods are then used to investigat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Bayesian Model Selectionof Autoregressive

نویسندگان

چکیده

منابع مشابه

Structure of Wavelet Covariance Matrices and Bayesian Wavelet Estimation of Autoregressive Moving Average Model with Long Memory Parameter’s

Comparison of Neural Network Models, Vector Auto Regression (VAR), Bayesian Vector-Autoregressive (BVAR), Generalized Auto Regressive Conditional Heteroskedasticity (GARCH) Process and Time Series in Forecasting Inflation in ‎Iran‎

Variational Bayesian Autoregressive Conditional Heteroskedastic Models

Bayesian Analysis of Spatial Probit Models in Wheat Waste Management Adoption

On the Estimation of Missing Data in Incomplete Databases: Autoregressive Bayesian Networks

Forthcoming in the Journal of Econometrics Bayesian Analysis of Long Memory and Persistence using ARFIMA Models

عنوان ژورنال:

اشتراک گذاری